# Suppress warnings
import warnings
warnings.simplefilter('ignore')
warnings.filterwarnings('ignore')
# Search & Access
import earthaccess
from pprint import pprint
import xarray as xr
import hvplot.xarray #plot
# Harmony services
import geopandas as gpd
import geoviews as gv
gv.extension('bokeh', 'matplotlib')
from harmony import BBox, Client, Collection, Request, LinkType
import datetime as dt
import s3fs
%matplotlib inlineNASA Earthdata Cloud Clinic
Summary
Welcome to the NASA Earthdata Cloud Clinic!
We will focus on NASA Earthdata search & access.
We will use earthaccess for data search and direct cloud access, followed by xarray` for subsetting. Both are open source python libraries. We will also discover data using Earthdata Search
We will be accessing data directly from Amazon Web Services (AWS), specifically in the us-west-2 region, which is where all cloud-hosted NASA Earthdata reside. This shared compute environment (JupyterHub) is also running in the same location. We will then load the data into Python as an xarray dataset.
See the bottom of the notebook for additional resources, including several tutorials that served as a foundation for this clinic.
A note on earthaccess python library
In this example we will use the earthaccess library to search for data collections from NASA Earthdata. earthaccess is a Python library that simplifies data discovery and access to NASA Earth science data by providing an abstraction layer for NASA’s Common Metadata Repository (CMR) API Search API. The library makes searching for data more approachable by using a simpler notation instead of low level HTTP queries. earthaccess takes the trouble out of Earthdata Login authentication, makes search easier, and provides a stream-line way to download or stream search results into an xarray object for easy data access. It can be used on and off the cloud.
For more on earthaccess visit the earthaccess documentation site. Be aware that earthaccess is under active development, and your use and feedback help improve it!
A note on subsetting
In addition to directly accessing the files archived and distributed by each of the NASA DAACs, many datasets also support services that allow us to customize the data via subsetting, reformatting, reprojection/regridding, and file aggregation. What does subsetting mean? Here’s a generalized graphic of what we mean.

A note on jargon:
“direct cloud access” goes by several other names including “direct S3 access”, “direct access”, “direct in-region access”, “in-cloud data streaming”. And “subsetting” is also called “transformation”.
Learning Objectives
- Utilize the
earthaccesspython library to search for data using spatial and temporal filters and explore search results - Stream data (i.e. perform in-region direct access of data) from an Amazon Simple Storage Service (S3) bucket where NASA Earthdata data is archived into our own cloud workspace, here in the Jupyter Lab/Notebook.
- Extract variables and spatial slices from an
xarraydataset Plot data
- Conceptualize data subsetting services provided by NASA Earthdata, including Harmony
- Plot a polygon geojson file with a basemap using
geoviews
Prerequisites
First we’ll import python packages and set our authentication that will be used for both of our access and subsetting methods.
This tutorial is meant to be run in the AWS cloud in the us-west-2 region. You’ll need to be aware that data in NASA’s Earthdata Cloud reside in Amazon Web Services (AWS) Simple Storage Service (S3) buckets. Access is provided via temporary credentials; this free access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region. While this compute location is required for direct S3 access, all data in Earthdata Cloud are still freely available via download.
Import Required Packages
Authentication for NASA Earthdata
An Earthdata Login account is required to access data from the NASA Earthdata system. If you don’t already have one, visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.
The first step is to get the correct authentication that will allow us to get cloud-hosted data from NASA. This is all done through Earthdata Login. We can use the earthaccess library here, where the login method also gets the correct AWS credentials.
The first time you run this it will ask you for your Earthdata Login username and password, and stores it in a .netrc file. After that, each time you authenticate with auth = earthaccess.login() it will log you in automatically.
# auth = earthaccess.login(strategy="interactive", persist=True)
auth = earthaccess.login()1. earthaccess + xarray
earthaccess python library is an open-source library to simplify Earthdata Cloud search and access.
Search for data
There are multiple keywords we can use to discovery data from collections such as short_name, concept_id, and doi. The table below contains the short_name for some collections we are interested in for other exercises. Each of these can be used to search for data or information related to the collection we are interested in.
| Shortname | Description | Example Temporal/Spatial parameters |
|---|---|---|
| SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205 | enter description | temporal=(“2021-07-01”, “2021-09-30”) |
| MUR25-JPL-L4-GLOB-v04.2 | MUR Sea Surface Temperature | temporal=(“2023-07-01”, “2023-07-03”), bounding_box=(-99, 18.19232, -78.85547, 31.23754) |
| EMITL2BMIN | EMIT L2B estimated mineral identification and band depths in a spatially raw, non-orthocorrected format | temporal=(“2023-07-01”, “2023-07-31”), bounding_box=(-99, 18.19232, -78.85547, 31.23754) |
| ECO_L2G_LSTE | ECOSTRESS Gridded Land Surface Temperature and Emissivity Instantaneous Level 2 Global 70 m | temporal=(“2023-07-01”, “2023-07-03”), bounding_box=(-99, 18.19232, -78.85547, 31.23754) |
But wait…You may be asking “how can we find the short_name for collections not in the table above?”.
–> Let’s take a quick detour and head to Earthdata Search GUI to gather more information about our dataset of interest. The dataset “short name” can be found by clicking on the Info button on our collection search result, and we can paste that into a python variable.
(Side Note: Both earthaccess python libray and the Earthdata Search (GUI) leverage the Common Metadata Repository (CMR) API to search for collections and granules.)
Here we use the search_data function of earthaccess to query based on the short_name of interest, as well as other paramters such as temporal range:
data_name = "SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205"
results = earthaccess.search_data(
short_name=data_name,
cloud_hosted=True,
temporal=("2021-07-01", "2021-09-30"),
)Granules found: 18
According to PO.DAAC’s dataset landing page, gridded Sea Surface Height Anomalies (SSHA) above a mean sea surface are provided. The data are produced on a 1/6th degree grid every 5 days.
We can discover more information about the matching files:
pprint(results[0])Collection: {'Version': '2205', 'ShortName': 'SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'BoundingRectangles': [{'WestBoundingCoordinate': 0.083, 'SouthBoundingCoordinate': -79.917, 'EastBoundingCoordinate': 180, 'NorthBoundingCoordinate': 79.917}, {'WestBoundingCoordinate': -180, 'SouthBoundingCoordinate': -79.917, 'EastBoundingCoordinate': -0.083, 'NorthBoundingCoordinate': 79.917}]}}}
Temporal coverage: {'RangeDateTime': {'EndingDateTime': '2021-07-05T00:00:00.000Z', 'BeginningDateTime': '2021-07-05T00:00:00.000Z'}}
Size(MB): 9.307239532470703
Data: ['https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205/ssh_grids_v2205_2021070512.nc']
Access data
Our code will work the same way if we are running it “in-region”, within our shared cloud environment, or locally from our laptop.
Since we are working in the AWS us-west-2 region, we can stream data directly to xarray. We are using the open_mfdataset() (multi-file) method, which is required when using earthaccess.
(Tips: To open a single file, if troubleshooting for example: ds = xr.open_dataset(earthaccess.open(results)[0]))
ds = xr.open_mfdataset(earthaccess.open(results))
dsOpening 18 granules, approx size: 0.16 GB
using endpoint: https://archive.podaac.earthdata.nasa.gov/s3credentials
<xarray.Dataset> Size: 299MB
Dimensions: (Time: 18, Longitude: 2160, nv: 2, Latitude: 960)
Coordinates:
* Longitude (Longitude) float32 9kB 0.08333 0.25 0.4167 ... 359.8 359.9
* Latitude (Latitude) float32 4kB -79.92 -79.75 -79.58 ... 79.75 79.92
* Time (Time) datetime64[ns] 144B 2021-07-05T12:00:00 ... 2021-09-2...
Dimensions without coordinates: nv
Data variables:
Lon_bounds (Time, Longitude, nv) float32 311kB dask.array<chunksize=(1, 2160, 2), meta=np.ndarray>
Lat_bounds (Time, Latitude, nv) float32 138kB dask.array<chunksize=(1, 960, 2), meta=np.ndarray>
Time_bounds (Time, nv) datetime64[ns] 288B dask.array<chunksize=(1, 2), meta=np.ndarray>
SLA (Time, Latitude, Longitude) float32 149MB dask.array<chunksize=(1, 960, 2160), meta=np.ndarray>
SLA_ERR (Time, Latitude, Longitude) float32 149MB dask.array<chunksize=(1, 960, 2160), meta=np.ndarray>
Attributes: (12/21)
Conventions: CF-1.6
ncei_template_version: NCEI_NetCDF_Grid_Template_v2.0
Institution: Jet Propulsion Laboratory
geospatial_lat_min: -79.916664
geospatial_lat_max: 79.916664
geospatial_lon_min: 0.083333336
... ...
version_number: 2205
Data_Pnts_Each_Sat: {"16": 743215, "1007": 674076}
source_version: commit 58c7da13c0c0069ae940c33a82bf1544b7d991bf
SLA_Global_MEAN: 0.06428374482174487
SLA_Global_STD: 0.0905195660534004
latency: finalPlot the data
Let’s make a quick interactive plot of the data using an open source tool call hvplot. Because our data is 3d and has a time component, we can also preview the data over time, using the slider of hvplot.
ds.SLA.hvplot.image(x='Longitude', y='Latitude', cmap='RdBu', clim=(-2,2), title="Sea Level Anomaly Estimate (m)")